智能论文笔记

Semantically-enhanced Topic Recommendation System for Software Projects

Maliheh Izadi , Mahtab Nejati , Abbas Heydarnoori

分类：机器学习

2022-05-31

Software-related platforms have enabled their users to collaboratively label software entities with topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. For instance, a correct and complete set of topics assigned to a repository can increase its visibility. Consequently, this improves the outcome of tasks such as browsing, searching, navigation, and organization of repositories. Unfortunately, assigned topics are usually highly noisy, and some repositories do not have well-assigned topics. Thus, there have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. We propose two recommender models for tagging software projects that incorporate the semantic relationship among topics. Our approach has two main phases; (1) we first take a collaborative approach to curate a dataset of quality topics specifically for the domain of software engineering and development. We also enrich this data with the semantic relationships among these topics and encapsulate them in a knowledge graph we call SED-KGraph. Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph. The second predictive model, however, assumes there are no topics available for a repository, hence it proceeds to predict the relevant topics based on both textual information of a software project and SED-KGraph. We built SED-KGraph in a crowd-sourced project with 170 contributors from both academia and industry. The experiment results indicate that our solutions outperform baselines that neglect the semantic relationships among topics by at least 25% and 23% in terms of ASR and MAP metrics.

translated by 谷歌翻译

信息传播是网络科学研究的一个有趣的主题，该主题研究了信息，影响或传染的方式如何通过网络传播。图形燃烧是一个简化的确定性模型，用于信息如何在网络中传播。该问题的复杂NP完整性质使使用精确算法在计算上很难求解。因此，在文献中为图形燃烧问题提出了许多启发式方法和近似算法。在本文中，我们提出了一种有效的遗传算法，称为基于中心性的遗传过偏（CBAG）来解决图燃烧问题。考虑到图形燃烧问题的独特特征，我们介绍了新颖的遗传操作员，染色体表示和评估方法。在拟议的算法中，众所周知的中心性用作我们染色体初始化程序的骨干。实施了所提出的算法并将其与15个不同尺寸基准图上的先前的启发式和近似算法进行了比较。根据结果，可以看出，与先前的最新启发式方法相比，所提出的算法取得了更好的性能。完整的源代码可在线获得，可用于为图形燃烧问题找到最佳或近乎最佳的解决方案。

translated by 谷歌翻译

新的冠状病毒造成了超过一百万的死亡，并继续迅速传播。这种病毒靶向肺部，导致呼吸窘迫，这可以轻度或严重。肺的X射线或计算机断层扫描（CT）图像可以揭示患者是否感染Covid-19。许多研究人员正在尝试使用人工智能改善Covid-19检测。我们的动机是开发一种可以应对的自动方法，该方法可以应对标记数据的方案是耗时或昂贵的。在本文中，我们提出了使用依赖于Sobel边缘检测和生成对冲网络（GANS）的有限标记数据（SCLLD）的半监督分类来自动化Covid-19诊断。 GaN鉴别器输出是一种概率值，用于在这项工作中进行分类。建议的系统使用从Omid Hosparing收集的10,000 CT扫描培训，而公共数据集也用于验证我们的系统。将该方法与其他最先进的监督方法进行比较，例如高斯过程。据我们所知，这是第一次提出了对Covid-19检测的半监督方法。我们的系统能够从有限标记和未标记数据的混合学习，该数据由于缺乏足够量的标记数据而导致的监督学习者失败。因此，我们的半监督训练方法显着优于卷积神经网络（CNN）的监督培训，当标记的训练数据稀缺时。在精度，敏感性和特异性方面，我们的方法的95％置信区间分别为99.56±0.20％，99.88±0.24％和99.40±0.1.18％，而CNN的间隔（训练有素的监督）为68.34 + - 4.11％，91.2 + - 6.15％，46.40 + - 5.21％。

translated by 谷歌翻译